{ "cells": [ { "cell_type": "markdown", "id": "f1ec66ee", "metadata": {}, "source": [ "# Aula 8 - Análise de associação\n", "\n", "\n", "![](img/association.png)\n", "\n", "\n", "## 8.1 O pacote apyori\n", "Usaremos o pacote `apyori` para realizar as análises. Par ainstalar:\n", "\n", "`>> pip install apyori`" ] }, { "cell_type": "code", "execution_count": 1, "id": "f62d3ef9", "metadata": {}, "outputs": [], "source": [ "from apyori import apriori" ] }, { "cell_type": "markdown", "id": "881e76b4", "metadata": {}, "source": [ "O algoritmo apriori do pacote precisa que os dados estejam no formato de lista de listas. Considere o exemplo abaixo como template:" ] }, { "cell_type": "markdown", "id": "79f5d4a0", "metadata": {}, "source": [ "## 8.2 Um exemplo simples" ] }, { "cell_type": "code", "execution_count": 2, "id": "14e16c25", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RelationRecord(items=frozenset({'açucar'}), support=0.6666666666666666, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'açucar'}), confidence=0.6666666666666666, lift=1.0)])\n" ] } ], "source": [ "records = [[\"leite\",\"pao\",\"manteiga\",\"açucar\"],\n", " [\"leite\",\"pao\",\"manteiga\"],\n", " [\"leite\",\"pao\",\"coca\",\"açucar\"]]\n", "association_rules = apriori(records, min_support=0.0001, min_confidence=0.0, min_lift=0, min_length=1)\n", "association_results = list(association_rules)\n", "print(association_results[0])" ] }, { "cell_type": "markdown", "id": "81d3e482", "metadata": {}, "source": [ "## 8.3 Importando um banco de dados\n", "Usando o banco de dados `store_data.csv`, em que cada linha contém os itens comprados em uma loja na Frrança:" ] }, { "cell_type": "code", "execution_count": 3, "id": "4fc5db29", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678910111213141516171819
0shrimpalmondsavocadovegetables mixgreen grapeswhole weat flouryamscottage cheeseenergy drinktomato juicelow fat yogurtgreen teahoneysaladmineral watersalmonantioxydant juicefrozen smoothiespinacholive oil
1burgersmeatballseggsNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2chutneyNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3turkeyavocadoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4mineral watermilkenergy barwhole wheat ricegreen teaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " 0 1 2 3 4 \\\n", "0 shrimp almonds avocado vegetables mix green grapes \n", "1 burgers meatballs eggs NaN NaN \n", "2 chutney NaN NaN NaN NaN \n", "3 turkey avocado NaN NaN NaN \n", "4 mineral water milk energy bar whole wheat rice green tea \n", "\n", " 5 6 7 8 9 \\\n", "0 whole weat flour yams cottage cheese energy drink tomato juice \n", "1 NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN \n", "\n", " 10 11 12 13 14 15 \\\n", "0 low fat yogurt green tea honey salad mineral water salmon \n", "1 NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN NaN \n", "\n", " 16 17 18 19 \n", "0 antioxydant juice frozen smoothie spinach olive oil \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "dt = pd.read_csv(r\"G:\\Meu Drive\\Arquivos\\UFPR\\Disciplinas\\2 - Intro Mineração de Dados\\Python\\Datasets\\store_data.csv\", header = None)\n", "dt.head()" ] }, { "cell_type": "markdown", "id": "cb1d5cf0", "metadata": {}, "source": [ "## 8.4 Aplicando o algoritmo\n", "Temos que criar uma lista de listas com os itens." ] }, { "cell_type": "code", "execution_count": 4, "id": "58215df3", "metadata": {}, "outputs": [], "source": [ "vv_itens = []\n", "for linha in range(0, dt.shape[0]):\n", " vv_itens.append(dt.iloc[linha].dropna().values)\n" ] }, { "cell_type": "markdown", "id": "527f8987", "metadata": {}, "source": [ "Aplicando o algoritmo, com os parâmetros:\n", "\n", "`min_support`: suporte mínimo para as regras.\n", "\n", "`min_confidence`: confiança mínima para as regras.\n", "\n", "`min_lift`: lift mínimo para as regras.\n", "\n", "`min_length`: número mínimo de itens que as regras devem ter" ] }, { "cell_type": "code", "execution_count": 5, "id": "6f1e0439", "metadata": {}, "outputs": [], "source": [ "association_rules = apriori(vv_itens, min_support = 0.0045, min_confidence = 0.2, min_lift=3, min_length = 2)\n", "\n", "# Transformando os resultados em uma lista, para facilitar a visualização\n", "resultados = list(association_rules)" ] }, { "cell_type": "markdown", "id": "aa470d42", "metadata": {}, "source": [ "O resultado é uma lista, sendo que cada elemento possui informações a respoeito de uma regra. Os dados da lista são também listas. O primeiro elemento é um `frozenset` com os produtos da regra (antecedente e consequente), podemos criar uma lista para desempacotar os valores:" ] }, { "cell_type": "markdown", "id": "a83060b7", "metadata": {}, "source": [ "O segundo elemento da lista interna guarda o valor de suporte da regra:" ] }, { "cell_type": "code", "execution_count": 6, "id": "daf589b2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Suporte : 0.004532728969470737\n" ] } ], "source": [ "regra = 0\n", "print(\"Suporte :\", resultados[regra][1])" ] }, { "cell_type": "markdown", "id": "8c6670b6", "metadata": {}, "source": [ "O terceiro item da lista interna contém mais uma lista com um elemento do tipo `OrderedStatistic`, sendo que este também é tratado como uma lista..o primeiro e segundo elementos dessa lista são cada um as listas dos elementos precedentes e antededentes da regra. Podemos extrai-los da seguinte forma:" ] }, { "cell_type": "code", "execution_count": 7, "id": "ea4178ad", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['light cream'] -> ['chicken']\n" ] } ], "source": [ "regra = 0\n", "ant = [x for x in resultados[regra][2][0][0]]\n", "cons = [x for x in resultados[regra][2][0][1]]\n", "print(ant, \"->\", cons)" ] }, { "cell_type": "markdown", "id": "b780ed7b", "metadata": {}, "source": [ "O terceiro e quarto elementos desta lista armazenam a confiança e o lift:" ] }, { "cell_type": "code", "execution_count": 8, "id": "71370828", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confiança : 0.29059829059829057\n", "Lift : 4.84395061728395\n" ] } ], "source": [ "print(\"Confiança : \",resultados[regra][2][0][2])\n", "print(\"Lift : \",resultados[regra][2][0][3])" ] }, { "cell_type": "markdown", "id": "d18871b8", "metadata": {}, "source": [ "Juntando tudo, podemos imprimir as informações para todas as regras retornadas pelo algoritmo:" ] }, { "cell_type": "code", "execution_count": 9, "id": "ef5ac116", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['light cream'] -> ['chicken']\n", "Suporte : 0.004532728969470737\n", "Confiança : 0.29059829059829057\n", "Lift : 4.84395061728395\n", "====================================\n", "['mushroom cream sauce'] -> ['escalope']\n", "Suporte : 0.005732568990801226\n", "Confiança : 0.3006993006993007\n", "Lift : 3.790832696715049\n", "====================================\n", "['pasta'] -> ['escalope']\n", "Suporte : 0.005865884548726837\n", "Confiança : 0.3728813559322034\n", "Lift : 4.700811850163794\n", "====================================\n", "['herb & pepper'] -> ['ground beef']\n", "Suporte : 0.015997866951073192\n", "Confiança : 0.3234501347708895\n", "Lift : 3.2919938411349285\n", "====================================\n", "['tomato sauce'] -> ['ground beef']\n", "Suporte : 0.005332622317024397\n", "Confiança : 0.3773584905660377\n", "Lift : 3.840659481324083\n", "====================================\n", "['whole wheat pasta'] -> ['olive oil']\n", "Suporte : 0.007998933475536596\n", "Confiança : 0.2714932126696833\n", "Lift : 4.122410097642296\n", "====================================\n", "['pasta'] -> ['shrimp']\n", "Suporte : 0.005065991201173177\n", "Confiança : 0.3220338983050847\n", "Lift : 4.506672147735896\n", "====================================\n", "['chocolate', 'frozen vegetables'] -> ['shrimp']\n", "Suporte : 0.005332622317024397\n", "Confiança : 0.23255813953488375\n", "Lift : 3.2545123221103784\n", "====================================\n", "['ground beef', 'cooking oil'] -> ['spaghetti']\n", "Suporte : 0.004799360085321957\n", "Confiança : 0.5714285714285714\n", "Lift : 3.2819951870487856\n", "====================================\n", "['spaghetti', 'frozen vegetables'] -> ['ground beef']\n", "Suporte : 0.008665511265164644\n", "Confiança : 0.31100478468899523\n", "Lift : 3.165328208890303\n", "====================================\n", "['milk', 'frozen vegetables'] -> ['olive oil']\n", "Suporte : 0.004799360085321957\n", "Confiança : 0.20338983050847456\n", "Lift : 3.088314005352364\n", "====================================\n", "['mineral water', 'shrimp'] -> ['frozen vegetables']\n", "Suporte : 0.007199040127982935\n", "Confiança : 0.30508474576271183\n", "Lift : 3.200616332819722\n", "====================================\n", "['spaghetti', 'frozen vegetables'] -> ['olive oil']\n", "Suporte : 0.005732568990801226\n", "Confiança : 0.20574162679425836\n", "Lift : 3.1240241752707125\n", "====================================\n", "['spaghetti', 'frozen vegetables'] -> ['shrimp']\n", "Suporte : 0.005999200106652446\n", "Confiança : 0.21531100478468898\n", "Lift : 3.0131489680782684\n", "====================================\n", "['spaghetti', 'frozen vegetables'] -> ['tomatoes']\n", "Suporte : 0.006665777896280496\n", "Confiança : 0.23923444976076558\n", "Lift : 3.4980460188216425\n", "====================================\n", "['spaghetti', 'grated cheese'] -> ['ground beef']\n", "Suporte : 0.005332622317024397\n", "Confiança : 0.3225806451612903\n", "Lift : 3.283144395325426\n", "====================================\n", "['mineral water', 'herb & pepper'] -> ['ground beef']\n", "Suporte : 0.006665777896280496\n", "Confiança : 0.39062500000000006\n", "Lift : 3.975682666214383\n", "====================================\n", "['spaghetti', 'herb & pepper'] -> ['ground beef']\n", "Suporte : 0.006399146780429276\n", "Confiança : 0.3934426229508197\n", "Lift : 4.004359721511667\n", "====================================\n", "['ground beef', 'milk'] -> ['olive oil']\n", "Suporte : 0.004932675643247567\n", "Confiança : 0.22424242424242427\n", "Lift : 3.40494417862839\n", "====================================\n", "['ground beef', 'shrimp'] -> ['spaghetti']\n", "Suporte : 0.005999200106652446\n", "Confiança : 0.5232558139534884\n", "Lift : 3.005315360233627\n", "====================================\n", "['spaghetti', 'milk'] -> ['olive oil']\n", "Suporte : 0.007199040127982935\n", "Confiança : 0.20300751879699247\n", "Lift : 3.0825089038385434\n", "====================================\n", "['mineral water', 'soup'] -> ['olive oil']\n", "Suporte : 0.005199306759098787\n", "Confiança : 0.22543352601156072\n", "Lift : 3.4230301186492245\n", "====================================\n", "['spaghetti', 'pancakes'] -> ['olive oil']\n", "Suporte : 0.005065991201173177\n", "Confiança : 0.20105820105820105\n", "Lift : 3.0529100529100526\n", "====================================\n", "['mineral water', 'spaghetti', 'milk'] -> ['frozen vegetables']\n", "Suporte : 0.004532728969470737\n", "Confiança : 0.28813559322033894\n", "Lift : 3.0228043143297376\n", "====================================\n" ] } ], "source": [ "for i in range(0, len(resultados)):\n", " ant = [x for x in resultados[i][2][0][0]]\n", " cons = [x for x in resultados[i][2][0][1]]\n", " print(ant, \"->\", cons)\n", " print(\"Suporte :\", resultados[i][1])\n", " print(\"Confiança : \",resultados[i][2][0][2])\n", " print(\"Lift : \",resultados[i][2][0][3])\n", " print(\"====================================\")" ] }, { "cell_type": "markdown", "id": "9f440c5f", "metadata": {}, "source": [ "
\n", "

Exercícios

\n", "
\n", "\n", "1. Considere o conjunto de dados `MateriaisConstrucao.xlsx`, que mantém um registro da quantidade de itens que foram comprados em uma loja de materiais de construção. Será que existem alguns itens que estão relacionados com outros, quando da compra pelos clientes? Use regras de associação para responder." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }